Project Summary and Aims

The two core analysis aims are:

to do: add in text from Heather’s outline

Overview of Dataset Preparation

**to finish

Initially data was matched using an extraction from the Australian Honours List, and then matched back to wikiedia and wikipedia (see section below - TO DO - ADD LINK).

Once initial analysis was done, Alex Lum assisted in providing an extraction of wikipedia pages where an order was stated on the page. This led to the Order information being merged back into the wikidata page. This allowed us to extract further wikipedia pages. No additional pages have been created in wikipedia, but we have been able to get a bettr

Honours Data Set

The Department of the Prime Minster and Cabinet publish a list of Australian Honours recipients. This list includes all recipients of the Order of Australia.

The records were extracted from this database for all of the Order of Australia Awards issued since 1975, and extracted based on the following award levels:

  1. Dame of the Order of Australia
  2. Knight of the Order of Australia
  3. Companion (AC)
  4. Officer (AO)
  5. Member (AM)
  6. Medal (OAM)

More information about the Order of Australia can be found here: https://en.wikipedia.org/wiki/Order_of_Australia.

While the majority of cases are unique, there are some individuals who have been awarded multiple Orders of Australia. In the analysis shown below, all analysis that references the Honours data set, represents the number and type of awards issued. The number of awards in our data set ar XXXXinsert here###. These awards have been given to XXX individuals. A summary of the number of awards given to indivisuals is as follows

(show summary of number of awards by number of people)

Matching the honours data set with wikimedia information

expand on pr0cess here

  1. extract data from wikidata, icluding wikidata URL/ID and wikipedia URL
  2. get wikipedia page ID for all wikipedia articles
  3. use ID to etxract page creation date

Exploratory analysis

Overview of Order of Australia honours

  1. How many have been awarded?

  2. What is the breakdown by state?

How many wikipedia pages are there for Order of Australia recipients?

  1. What are the proportion Order of Australia recipients who have a wikipedia page?

  2. What are the differences by the order level?

  3. Are there any differences by recipient state?

What can we learn about the page creation date of those who have a wikipedia page?

  1. How many had pages BEFORE or AFTER they received their Order of Australia? Is this different by order level?

  2. Does receiving an order result in a spike of wikipedia pages being created?

  3. What is the rate of creation of pages? Has there been peaks? Has it slowed at any time?

Random notes

These are just small things I find along the way that may not be that important, but are intersting or that I need to follow up on

  • Is there a big proportion of rugby and badmington players who have wikidata entries? (could be good to do some alaysis on this sort of thing / proportion of representation by description in wikiData)
  • Have gender in data set - checing with Alex on if there is way for broader gender classification
  • How to handle peopel with multiple awards - multiple honour dates to single page creation date
  • Need to find the entry of the nursing accadmic who is referenced in wikipedia page, but has no page of her own / has a wikidata entry (create list of these poeple to have a look and see who they are) (think this is it: https://www.wikidata.org/wiki/Q47193440 - no page but she is refernced here: https://en.wikipedia.org/wiki/Amorality | megan jane JOhnson)
  • I think I found a few folks that had non eng wikipedia pages (think one was the Producer of shine? - need to check) Is this of interest? Possibly only a v small number. Could scrape the non english pages to see where they have page. (Also - what Australian’s in general have been given page in other languages?)
  • Can I do some “bag of word” analysis on the award description and see if there are any areas that result in more page creations than others? (scientists v politicians v sorts people etc? see point above about representation in decsription of wikidata as well)

###Data Limitations

Dataset from PM offce is imcomplete - some award issue dates are missing - ADD NUMBER HERE

Gender - record method that Alex used as per email.